R软件中的计量经济学程序包纵览
凡是搞计量经济的,都关注这个号了
邮箱:econometrics666@sina.cn
所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.
用R语言做Econometrics的书籍推荐, 值得拥有的经典
《正文》
R软件中的计量经济学程序包
Base R ships with a lot of functionality useful for computational econometrics, in particular in the stats package. This functionality is complemented by many packages on CRAN, a brief overview is given below.
There is also a considerable overlap between the tools for econometrics in this view and those in the task views on Finance, SocialSciences, and TimeSeries. Furthermore, the Finance SIG is a suitable mailing list for obtaining help and discussing questions about both computational finance and econometrics.
The packages in this view can be roughly structured into the following topics. If you think that some package is missing from the list, please contact the maintainer.
Basic linear regression(基础的线性回归)
· Estimation and standard inference(估计和标准推断) : Ordinary least squares (OLS) estimation for linear models is provided by lm() (from stats) and standard tests for model comparisons are available in various methods such as summary() and anova().
· Further inference and nested model comparisons(进一步推断和嵌套模型比较) : Functions analogous to the basic summary() and anova() methods that also support asymptotic tests ( z instead of t tests, and Chi-squared instead of F tests) and plug-in of other covariance matrices are coeftest() and waldtest() in lmtest. Tests of more general linear hypotheses are implemented in linearHypothesis() and for nonlinear hypotheses in deltaMethod() in car.
· Robust standard errors(稳健标准误) : HC and HAC covariance matrices are available in sandwich and can be plugged into the inference functions mentioned above.
· Nonnested model comparisons(非嵌套模型比较) : Various tests for comparing non-nested linear models are available in lmtest (encompassing test, J test, Cox test). The Vuong test for comparing other non-nested models is provided by nonnest2 (and specifically for count data regression in pscl).
· Diagnost checking : The packages car and lmtest provide a large collection of regression diagonstics and diagnostic tests.
Microeconometrics(微观计量经济学)
· Generalized linear models (GLMs) (广义线性模型): Many standard microeconometric models belong to the family of generalized linear models and can be fitted by glm() from package stats. This includes in particular logit and probit models for modeling choice data and Poisson models for count data. Effects for typical values of regressors in these models can be obtained and visualized using effects. Marginal effects tables for certain GLMs can be obtained using the mfx and margins packages. Interactive visualizations of both effects and marginal effects are possible in LinRegInteractive.
· Binary responses (二值响应): The standard logit and probit models (among many others) for binary responses are GLMs that can be estimated by glm() with family = binomial. Bias-reduced GLMs that are robust to complete and quasi-complete separation are provided by brglm. Discrete choice models estimated by simulated maximum likelihood are implemented in Rchoice. Heteroscedastic probit models (and other heteroscedastic GLMs) are implemented in glmx along with parametric link functions and goodness-of-link tests for GLMs.
· Count responses(数值响应) : The basic Poisson regression is a GLM that can be estimated by glm() with family = poisson as explained above. Negative binomial GLMs are available via glm.nb() in package MASS. Another implementation of negative binomial models is provided by aod, which also contains other models for overdispersed data. Zero-inflated and hurdle count models are provided in package pscl. A reimplementation by the same authors is currently under development in countreg on R-Forge which also encompasses separate functions for zero-truncated regression, finite mixture models etc.
· Multinomial responses(多值响应) : Multinomial models with individual-specific covariates only are available in multinom() from package nnet. Implementations with both individual- and choice-specific variables are mlogit and mnlogit. Generalized multinomial logit models (e.g., with random effects etc.) are in gmnl. Generalized additive models (GAMs) for multinomial responses can be fitted with the VGAM package. A Bayesian approach to multinomial probit models is provided by MNP. Various Bayesian multinomial models (including logit and probit) are available in bayesm. Furthermore, the package RSGHB fits various hierarchical Bayesian specifications based on direct specification of the likelihood function.
· Ordered responses (排序响应): Proportional-odds regression for ordered responses is implemented in polr() from package MASS. The package ordinal provides cumulative link models for ordered data which encompasses proportional odds models but also includes more general specifications. Bayesian ordered probit models are provided by bayesm.
· Censored responses(删失响应) : Basic censored regression models (e.g., tobit models) can be fitted by survreg() in survival, a convenience interface tobit() is in package AER. Further censored regression models, including models for panel data, are provided in censReg. Interval regression models are in intReg. Censored regression models with conditional heteroscedasticity are in crch. Furthermore, hurdle models for left-censored data at zero can be estimated with mhurdle. Models for sample selection are available in sampleSelection and semiparametric extensions of these are provided by SemiParSampleSel. Package matchingMarkets corrects for selection bias when the sample is the result of a stable matching process (e.g., a group formation or college admissions problem).
· Truncated responses(截断响应) : crch for truncated (and potentially heteroscedastic) Gaussian, logistic, and t responses. Homoscedastic Gaussian responses are also available in truncreg.
· Fraction and proportion responses : Fractional response models are in frm. Beta regression for responses in (0, 1) is in betareg and gamlss.
· Miscellaneous(其他) : Further more refined tools for microeconometrics are provided in the micEcon family of packages: Analysis with Cobb-Douglas, translog, and quadratic functions is in micEcon; the constant elasticity of scale (CES) function is in micEconCES; the symmetric normalized quadratic profit (SNQP) function is in micEconSNQP. The almost ideal demand system (AIDS) is in micEconAids. Stochastic frontier analysis (SFA) is in frontier and certain special cases also in sfa. Semiparametric SFA in is available in semsfa and spatial SFA in spfrontier and ssfa. The package bayesm implements a Bayesian approach to microeconometrics and marketing. Estimation and marginal effect computations for multivariate probit models can be carried out with mvProbit. Inference for relative distributions is contained in package reldist.
Instrumental variables(工具变量)
· Basic instrumental variables (IV) regression(基础工具变量回归) : Two-stage least squares (2SLS) is provided by ivreg() in AER. Other implementations are in tsls() in package sem, in ivpack, and lfe (with particular focus on multiple group fixed effects).
· Binary responses(二值响应) : An IV probit model via GLS estimation is available in ivprobit. The LARF package estimates local average response functions for binary treatments and binary instruments.
· Panel data (面板数据): Certain basic IV models for panel data can also be estimated with standard 2SLS functions (see above). Dedicated IV panel data models are provided by ivfixed (fixed effects) and ivpanel (between and random effects).
· Miscellaneous(其他) : REndo fits linear models with endogenous regressor using various latent instrumental variable approaches. ivbma estimates Bayesian IV models with conditional Bayes factors. ivlewbel implements the Lewbel approach based on GMM estimation of triangular systems using heteroscedasticity-based IVs.
Panel data models(面板数据模型)
· Panel-corrected standard errors (面板修正的标准误): A simple approach for panel data is to fit the pooling (or independence) model (e.g., via lm() or glm()) and only correct the standard errors. Different types of panel-corrected standard errors are available in multiwayvcov, clusterSEs, pcse, clubSandwich, plm, and geepack, respectively. The latter two require estimation of the pooling/independence models via plm() and geeglm() from the respective packages (which also provide other types of models, see below).
· Linear panel models(线性面板模型) : plm, providing a wide range of within, between, and random-effect methods (among others) along with corrected standard errors, tests, etc. Another implementation of several of these models is in Paneldata. Various dynamic panel models are available in plm and dynamic panel models with fixed effects in OrthoPanels.
· Generalized estimation equations and GLMs(广义估计方程和广义线性模型) : GEE models for panel data (or longitudinal data in statistical jargon) are in geepack. The pglm package provides estimation of GLM-like models for panel data.
· Mixed effects models (混合效应模型): Linear and nonlinear models for panel data (and more general multi-level data) are available in lme4 and nlme.
· Instrumental variables(工具变量) : ivfixed and ivpanel, see also above.
· Heterogeneous time trends(差异时间趋势) : phtt offers the possibility of analyzing panel data with large dimensions n and T and can be considered when the unobserved heterogeneity effects are time-varying.
· Miscellaneous(其他) : Multiple group fixed effects are in lfe. Autocorrelation and heteroscedasticity correction in are available in wahc and panelAR. PANIC Tests of nonstationarity are in PANICr. Threshold regression and unit root tests are in pdR. The panel data approach method for program evaluation is available in pampe.
Further regression models(进一步回归模型)
· Nonlinear least squares modeling(非线性最小二乘模型) : nls() in package stats.
· Quantile regression(分位数回归) : quantreg (including linear, nonlinear, censored, locally polynomial and additive quantile regressions).
· Generalized method of moments (GMM) and generalized empirical likelihood (GEL)(广义矩估计方法和广义经验似然估计) : gmm.
· Spatial econometric models(空间计量模型): The Spatial view gives details about handling spatial data, along with information about (regression) modeling. In particular, spatial regression models can be fitted using spdep and sphet (the latter using a GMM approach). splm is a package for spatial panel models. Spatial probit models are available in spatialprobit.
· Bayesian model averaging (BMA)(贝叶斯模型平均): A comprehensive toolbox for BMA is provided by BMS including flexible prior selection, sampling, etc. A different implementation is in BMA for linear models, generalizable linear models and survival models (Cox regression).
· Linear structural equation models(线性解构方程模型): lavaan and sem. See also the Psychometrics task view for more details.
· Simultaneous equation estimation(联立方程估计) : systemfit.
· Nonparametric kernel methods(非参数核方法) : np.
· Linear and nonlinear mixed-effect models(线性核非线性混合效应模型) : nlme and lme4.
· Generalized additive models (GAMs) (广义加性模型): mgcv, gam, gamlss and VGAM.
· Extreme bounds analysis(极值边界分析) : ExtremeBounds.
· Miscellaneous(其他) : The packages VGAM, rms and Hmisc provide several tools for extended handling of (generalized) linear regression models. Zelig is a unified easy-to-use interface to a wide range of regression models.
Time series data and models(时间序列数据和模型)
· The TimeSeries task view provides much more detailed information about both basic time series infrastructure and time series models. Here, only the most important aspects relating to econometrics are briefly mentioned. Time series models for financial econometrics (e.g., GARCH, stochastic volatility models, or stochastic differential equations, etc.) are described in the Finance task view.
· Infrastructure for regularly spaced time series(规则间隔时间序列的基础设施 ) : The class "ts" in package stats is R's standard class for regularly spaced time series (especially annual, quarterly, and monthly data). It can be coerced back and forth without loss of information to "zooreg" from package zoo.
· Infrastructure for irregularly spaced time series(不规则间隔时间序列的基础设施 ) : zoo provides infrastructure for both regularly and irregularly spaced time series (the latter via the class "zoo") where the time information can be of arbitrary class. This includes daily series (typically with "Date" time index) or intra-day series (e.g., with "POSIXct" time index). An extension based on zoo geared towards time series with different kinds of time index is xts. Further packages aimed particularly at finance applications are discussed in the Finance task view.
· Classical time series models(经典时间序列模型) : Simple autoregressive models can be estimated with ar() and ARIMA modeling and Box-Jenkins-type analysis can be carried out with arima() (both in the stats package). An enhanced version of arima() is in forecast.
· Linear regression models(线性回归模型): A convenience interface to lm() for estimating OLS and 2SLS models based on time series data is dynlm. Linear regression models with AR error terms via GLS is possible using gls() from nlme.
· Structural time series models(结构时间序列模型) : Standard models can be fitted with StructTS() in stats. Further packages are discussed in the TimeSeries task view.
· Filtering and decomposition(筛选和分解) : decompose() and HoltWinters() in stats. The basic function for computing filters (both rolling and autoregressive) is filter() in stats. Many extensions to these methods, in particular for forecasting and model selection, are provided in the forecast package.
· Vector autoregression (向量自回归): Simple models can be fitted by ar() in stats, more elaborate models are provided in package vars along with suitable diagnostics, visualizations etc. A Bayesian approach is available in MSBVAR.
· Unit root and cointegration tests(单位根和协整检验) : urca, tseries, CADFtest. See also pco for panel cointegration tests.
· Miscellaneous(其他) :
o tsDyn - Threshold and smooth transistion models.
o midasr - MIDAS regression and other econometric methods for mixed frequency time series data analysis.
o gets - GEneral-To-Specific (GETS) model selection for either ARX models with log-ARCH-X errors, or a log-ARCH-X model of the log variance.
o tsfa - Time series factor analysis.
o dlsem - Distributed-lag linear structural equation models.
o apt - Asymmetric price transmission models.
Data sets(数据集)
· Textbooks and journals(教科书和期刊) : Packages AER, Ecdat, and wooldridge contain a comprehensive collections of data sets from various standard econometric textbooks as well as several data sets from the Journal of Applied Econometrics and the Journal of Business & Economic Statistics data archives. AER and wooldridge additionally provide extensive sets of examples reproducing analyses from the textbooks/papers, illustrating various econometric methods.
· Canadian monetary aggregates (加拿大货币总计): CDNmoney.
· Penn World Table (佩恩表): pwt provides versions 5.6, 6.x, 7.x. Version 8.x and 9.x data are available in pwt8 and pwt9, respectively.
· Time series and forecasting data(时间序列和预测数据) : The packages expsmooth, fma, and Mcomp are data packages with time series data from the books 'Forecasting with Exponential Smoothing: The State Space Approach' (Hyndman, Koehler, Ord, Snyder, 2008, Springer) and 'Forecasting: Methods and Applications' (Makridakis, Wheelwright, Hyndman, 3rd ed., 1998, Wiley) and the M-competitions, respectively.
· Empirical Research in Economics(经济学实证研究) : Package erer contains functions and datasets for the book of 'Empirical Research in Economics: Growing up with R' (Sun, forthcoming).
· Panel Study of Income Dynamics (PSID)(收入动态追踪面板数据) : psidR can build panel data sets from the Panel Study of Income Dynamics (PSID).
· US state- and county-level panel data(美国州和县级面板数据): rUnemploymentData.
· World Bank data and statistics(世界银行数据和统计): The wbstats package provides programmatic access to the World Bank API.
Miscellaneous(其他)
· Matrix manipulations(矩阵操作) : As a vector- and matrix-based language, base R ships with many powerful tools for doing matrix manipulations, which are complemented by the packages Matrix and SparseM.
· Optimization and mathematical programming(优化和数学编程) : R and many of its contributed packages provide many specialized functions for solving particular optimization problems, e.g., in regression as discussed above. Further functionality for solving more general optimization problems, e.g., likelihood maximization, is discussed in the the Optimization task view.
· Bootstrap(自助法) : In addition to the recommended boot package, there are some other general bootstrapping techniques available in bootstrap or simpleboot as well some bootstrap techniques designed for time-series data, such as the maximum entropy bootstrap in meboot or the tsbootstrap() from tseries.
· Inequality(不平等) : For measuring inequality, concentration and poverty the package ineq provides some basic tools such as Lorenz curves, Pen's parade, the Gini coefficient and many more.
· Structural change (结构突变): R is particularly strong when dealing with structural changes and changepoints in parametric models, see strucchange and segmented.
· Exchange rate regimes(汇率制度) : Methods for inference about exchange rate regimes, in particular in a structural change setting, are provided by fxregime.
· Global value chains (全球价值链): Tools and decompositions for global value chains are in gvc and decompr.
· Regression discontinuity design(断点回归设计) : A variety of methods are provided in the rdd, rddtools, rdrobust, and rdlocrand packages.
CRAN packages:
· AER (core)
· aod
· apt
· bayesm
· betareg
· BMA
· BMS
· boot
· bootstrap
· brglm
· CADFtest
· car (core)
· CDNmoney
· censReg
· clubSandwich
· clusterSEs
· crch
· decompr
· dlsem
· dynlm
· Ecdat
· effects
· erer
· expsmooth
· ExtremeBounds
· fma
· forecast (core)
· frm
· frontier
· fxregime
· gam
· gamlss
· geepack
· gets
· glmx
· gmm
· gmnl
· gvc
· Hmisc
· ineq
· intReg
· ivbma
· ivfixed
· ivlewbel
· ivpack
· ivpanel
· ivprobit
· LARF
· lavaan
· lfe
· LinRegInteractive
· lme4
· lmtest (core)
· margins
· MASS
· matchingMarkets
· Matrix
· Mcomp
· meboot
· mfx
· mgcv
· mhurdle
· micEcon
· micEconAids
· micEconCES
· micEconSNQP
· midasr
· mlogit
· mnlogit
· MNP
· MSBVAR
· multiwayvcov
· mvProbit
· nlme
· nnet
· nonnest2
· np
· ordinal
· OrthoPanels
· pampe
· panelAR
· Paneldata
· PANICr
· pco
· pcse
· pdR
· pglm
· phtt
· plm (core)
· pscl
· psidR
· pwt
· pwt8
· pwt9
· quantreg
· Rchoice
· rdd
· rddtools
· rdlocrand
· rdrobust
· reldist
· REndo
· rms
· RSGHB
· rUnemploymentData
· sampleSelection
· sandwich (core)
· segmented
· sem
· SemiParSampleSel
· semsfa
· sfa
· simpleboot
· SparseM
· spatialprobit
· spdep
· spfrontier
· sphet
· splm
· ssfa
· strucchange
· survival
· systemfit
· truncreg
· tsDyn
· tseries (core)
· tsfa
· urca (core)
· vars
· VGAM
· wahc
· wbstats
· wooldridge
· xts
· Zelig
· zoo (core)
可以到计量社群交流R做计量的相关问题.
推荐阅读:
2.1998-2016年中国地级市年均PM2.5数据release
4.2005-2015中国分省分行业CO2数据circulation
5.实证研究中用到的135篇文章, 社科学者常用toolkit
可以到计量经济圈社群进一步访问交流各种学术问题,这年头,我们不能强调一个人的英雄主义,需要多多汲取他人的经验教训来让自己少走弯路。
计量经济圈是中国计量第一大社区,我们致力于推动中国计量理论和实证技能的提升,圈子以海内外高校研究生和教师为主。计量经济圈绝对六多精神:社科资料最多、社科数据最多、科研牛人最多、海外名校最多、热情互助最多、前沿趋势最多。如果你热爱计量并希望长见识,那欢迎你加入到咱们这个大家庭(戳这里),要不然你只能去其他那些Open access圈子了。注意:进去之后一定要看小鹅社群“群公告”,不然接收不了群息,也不知道怎么进入咱们独一无二的微信群和QQ群。在规则框架下社群交流讨论无时间限制。